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DETAILED ACTION 

1 . This action is responsive to communications: the application filed on 1 1/15/00, 
and the IDSs filed on 4/23/01, 6/3/02, 4/8/03 and 10/17/03. 

2. Claims 1-39 are pending in the case. Claims 1 , 37, 39 are independent claims. 

Information Disclosure Statement 

3. The information disclosure statement filed 1 0/1 7/03 fails to comply with 37 CFR 
1 .98 since the copy of the IPER has not been considered as a prior art. It has been 
placed in the application file, but the information referred to therein has not been 
considered. 

Claim Rejections - 35 USC §112 

4. The following is a quotation of the second paragraph of 35 U.S.C, 112: 

The specification shall conclude with one or more claims particularly pointing out and distinctly 
claiming the subject matter which the applicant regards as his invention. 

5. Claims 15, 19. 23 are rejected under 35 U.S.C. 112, second paragraph, as being 
indefinite for failing to particularly point out and distinctly claim the subject matter which 
applicant regards as the invention. 

Regarding claim 15, the claimed language is not proper when addressing that "the 
information retrieval method is a text classification method." Retrieving information is 
getting data from a database based on a request whereas text classifying is for putting 
text in different groups based on different criteria such as topic, type, etc. The two 
methods, therefore, are different, and one can not be the other. 
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Regarding claim 19, it is unclear what the document contains and how the two words 
relate to the user-specified minimum number of times when the claim states that 
"wherein the sequences of at least two words are considered as appearing in a 
document when the document contains the sequence of at least two words at least a 
user-specified minimum number of times" 

6. Claim 23 recites the limitation "wherein the monotonic function is the number of 
words in the phrase" in lines 1-2. There is insufficient antecedent basis for this limitation 
in the claim since claim 21 on which claim 23 is dependent, does not mention "the 
monotonic function." It is suggested that Applicants change the dependency of claim 23 
to be dependent on claim 22, which mentions the monotonic function. 

Double Patenting 

7. Claim 20 is objected to under 37 CFR 1 .75 as being a substantial duplicate of 
claim 19 since the only difference between the two claims is that claim 19 claims "the 
number of times" (line 4) and claim 20 claims "frequency" (line 4), where the number of 
times and the frequency have the same meaning. When two claims in an application 
are duplicates or else are so close in content that they both cover the same thing, 
despite a slight difference in wording, it is proper after allowing one claim to object to the 
other as being a substantial duplicate of the allowed claim. See MPEP § 706.03(k). 
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Claim Rejections - 35 USC § 103 



8. The following is a quotation of 35 U.S.C. 103(a) which fornns the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

9. Claims 1-2. 10-21, 26-29, 37-39 are rejected under 35 U.S.C, 103(a) as being 
unpatentable over Gomes et al. (US Pat No. 6,615,209 B1 . 9/2/03. filed 10/6/00. priority 
2/22/00). 

Regarding independent claim 1, Gomes discloses: 

- initially, selecting distinctive features contained in the collection of documents 
(col 3, lines 33-43, col 7, lines 43-56: the query-relevant parts extracted from the 
documents are distinctive features of the documents since the query-relevant 
parts includes specific information common to the documents; though Gomes 
does not explicitly mention the collection of documents, the fact that extracting 
the query-relevant parts from a plurality of documents suggests that these 
documents are in a collection for extracting) 

- for each pair of documents having at least one distinctive features in common, 
comparing the distinctive features of the documents to determine whether the 
document are duplicate or near-duplicate document (col 3, line 33 to col 4, line 
10, col 2, lines 38-56. col 7, lines 43-56: comparing each two documents for 
similarity based on the query-relevant parts referred as "snippets" where the 
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documents found can be duplicate, or duplicate with slight change, which means 
near-duplicate) 

Gomes does not explicitly disclose that for each document, identifying the distinctive 
features contained in the document. 

However, it would have been obvious to one of ordinary skill in the art at the time of the 
invention was made to have modified Gomes to include identifying the distinctive 
features contained in each document since the fact that the query-relevant parts are 
extracted from the plurality of documents suggests that the query-relevant parts are 
identified in each document before being extracted. 

Regarding claim 2, which is dependent on claim 1, Gomes discloses that the method is 
applied to removing duplicates in document collections (figure 9, #930, col 8, lines 37- 
60: the duplicate removal management process uses query-relevant information to 
extract query-relevant information form documents indicates that Gomes method is 
applied for removing duplicates in a plurality of documents which are document 
collections). 

Regarding claim 10, which is dependent on claim 1, Gomes discloses that the method is 
applied to creating a document index for use with a query system to efficiently find 
documents in response to a query which contains a particular phrase or excerpt (col 6, 
lines 10-27: "...a crawling process gets content from various sources accessible and 
stores such content... an automated indexing/sorting process may access the stored 
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content and may generate a content index ...a query processing process accepts 
queries and returns query results based on the content index..."; the returned query 
results based on the content index suggest that the query contains at least some words 
of the content index). 

Regarding claim 11, which is dependent on claim 10, Gomes discloses that the 
document index can be utilized even if the particular phrase or excerpt was not 
recorded correctly in the document or in the query (col 6, lines 10-27: the fact that the 
queries are accepted and the query result are returned based on the content index 
suggests the document index can be used no matter how a particular phrase is 
recorded in the query or document). 

Regarding claim 1 2, which is dependent on claim 1 , Gomes discloses that the 
distinctive features appear in a different order in each of the documents (col 13, lines 1- 
22: "..the word frequencies of the query-relevant part ...two files with the same words in 
different orders would appear to be identical"). 

Regarding claim 13, which is dependent on claim 1, Gomes discloses the distinctive 
features are distinctive text fragments from the document in the document collection (col 
7, lines 50-56; col 10. lines 56-67: the query-relevant information or the segments 
surrounding keyword occurrences are text fragments from the documents that show 
distinctive features of the document). 
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Regarding claim 14, whicli is dependent on claim 13, Gomes discloses that the method 
is applied to information retrieval methods (col 5, line 66 to col 6, line 2; col 7, lines 28- 
40). 

Regarding claim 15, which is dependent on claim 14, Gomes does not disclose that the 
information retrieval method is a text classification method. However, it would have 
been obvious to an ordinary skill in the art at the time of the invention was made to have 
modified Gomes to include the fact that the information retrieval method is a text 
classification method since the process of retrieving information from a database would 
be performed faster based on the text classifying of documents in the database where 
the documents are stored according to various topics or types. 

Regarding claim 16, which is dependent on claim 14, Gomes discloses that the 
information retrieval method assumes word independence, and the distinctive text 
fragments are added to an index set (col 6, lines 10-27: "...a crawling process gets 
content from various sources accessible and stores such content... an automated 
indexing/sorting process may access the stored content and may generate a content 
index ...a query processing process accepts queries and returns query results based on 
the content index..."; the returned query results based on the content index suggest that 
the content index contains at least some words of the query that is a distinctive text 
fragment). 
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Regarding claim 17, which is dependent on claim 13, Gomes discloses that the 
distinctive text fragments are sequences of at least two words that appear in documents 
in the document collection (col 10, lines 56-67: the segments or query-relevant 
information show that the distinctive text fragments are sequences of at least two 
words). Gomes does not disclose that the distinctive text fragments appear in a limited 
number of documents in the document collection. However, it would have been obvious 
to one of ordinary skill in the art at the time of the invention was made to have modified 
Gomes to include the limited number of documents in the document collection that the 
distinctive text fragments appear since only a number of documents having the text 
fragments that includes the keywords in the query, not all of the documents. Therefore, 
the number of documents is limited. 

Regarding claim 18, which is dependent on claim 14, Gomes discloses that if one 
distinctive text fragments is contained within another distinctive text fragment within the 
same document, only the longest distinctive text fragment is considered as a distinctive 
feature (col 10, lines 44-67: the fact that segments surrounding keyword occurrences or 
keyword-in-context summaries suggest that the segment which is considered as the 
longest distinctive text fragment since it includes the query-related information, which is 
the shorter distinctive text fragments). 

Regarding claim 19, which is dependent on claim 17, Gomes discloses that the 
sequences of at least two words are considered as appearing in a document when the 
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document contains the sequence of at least two words at least a user-specified 
minimum number of times (col 12, lines 18-35: the fact that a segment may be added to 
the query-relevant part QR only if it contains at least a predetermined number of 
occurrences of any of the keywords where a segment is a portion of a document 
suggests that the document contains the sequence of keywords and a specified 
minimum number of times of the occurrences of the keywords where it was obvious that 
the predetermined number occurrences can be defined by user). 

Regarding claim 20, which is dependent on claim 17. Gomes discloses that the 
sequences of at least two words are considered as appearing in a document when the 
document contains the sequence of at least two words at least a user-specified 
minimum frequency (col 12, lines 18-35: the fact that a segment may be added to the 
query-relevant part QR only if it contains at least a predetermined number of 
occurrences of any of the keywords where a segment is a portion of a document 
suggests that the document contains the sequence of keywords and a specified 
minimum frequency of the occurrences of the keywords where it was obvious that the 
predetermined number of occurrences can be defined by user). 

Regarding claim 21, which is dependent on claim 17, Gomes discloses: 

- the highest scoring sequences that are found in at least two documents in the 
document collection are considered distinctive text fragments (col 12, lines 40- 
54: the fact that only a predetermined number of the highest ranking segments 
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would be added to the query-relevant part QR suggests the highest ranking 
segments added to the query-relevant part QR be considered as distinctive text 
fragments) 

Gomes does not explicitly disclose calculating a distinctive score for each sequence of 
at least two words. However, it would have been obvious to one of ordinary skill in the 
art at the time of the invention was made to have modified Gomes to include calculating 
a distinctive score for each sequence of at least two words since once a segment 
including sequences of words related to the query is ranked as a highest ranking 
segments , the ranking process must be carried out based on the scores of a plurality of 
segments. In other words, calculating a score for each sequence of words must be 
performed for the segment ranking. 

Regarding claim 26, which is dependent on claim 17, Gomes does not explicitly disclose 
that the limited number of documents is selected by a user. 
Instead, Gomes discloses that since the amount of text extracted influences a 
subsequent similarity measure, the tunable parameters 933 and 935 should be adjusted 
in concert (figure 9 and col 10, lines 44-50). Gomes further explains that "in general, 
the less information extracted, the more similar the documents may be found to be (so 
the similarity threshold should be set higher, or stated oppositely, the more information 
extracted, the less similar the documents may be found to be (so the similarity threshold 
should be set lower)" (col 10, lines 51-55). 
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It would have been obvious to one of ordinary skill in the art at the time of the invention 
was made to have modified Gomes to include the user's selection for the limited 
number of documents since the fact a user can adjust the extraction parameters or the 
similarity measure parameters for a desired result suggests a possibility for users to 
select the limited number of documents for the adjustment. 

Regarding claim 27, which is dependent on claim 17, Gomes does not explicitly disclose 
that the limited number is defined by a linear function of the number of documents in the 
document collection. 

However, as mentioned in claim 26 above. Gomes discloses that a user can select the 
parameters in the program to adjust text extraction and the similarity measure (col 10, 
lines 44-55). 

It would have been obvious to one of ordinary skill in the art at the time of the invention 
was made to have modified Gomes to incorporate a linear function of the number of 
documents based on the adjusted parameters and the number of documents in the 
document collection. 

Regarding claim 28, which is dependent on claim 17, Gomes discloses that the 
distinctive text fragments include glue words (col 10, line 56 to col 1 1 , line 1 1 : though 
the keywords preferably do not include the "stop word" or glue word such as "the", "it", 
"and", "or", etc. for the search, the keywords are included in the snippets, which are the 
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segments surrounding the keywords; therefore, the segments surrounding the 
keywords, that are equivalent to the distinctive text fragments, still include glue words). 

Regarding claim 29, which is dependent on claim 17, Gomes does not explicitly disclose 
that the glue words do not appear at either extreme of the distinctive text fragments. 
However, it would have been obvious to one of ordinary skill in the art at the time of the 
invention was made to have modified Gomes to include the feature that the glue words 
do not appear at either extreme of the distinctive text fragments for the following reason. 
Since the glue words do not convey much information or convey some type of Boolean 
operations (col 1 1 , lines 1-11), there is no need to include the glue words at the either 
extreme of the distinctive text fragments. 

Regarding independent claim 37 and its dependent claim 38, Gomes discloses: 

- initially, selecting distinctive features contained in the collection of documents 
(col 3, lines 33-43, col 7, lines 43-56: the query-relevant parts extracted from the 
documents are distinctive features of the documents since the query-relevant 
parts includes specific information common to the documents; though Gomes 
does not explicitly mention the collection of documents, the fact that extracting 
the query-relevant parts from a plurality of documents suggests that these 
documents are in a collection for extracting) 

- for each pair of documents having at least one distinctive features in common, 
comparing the distinctive features of the documents to determine whether the 
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document are duplicate or near-duplicate document (col 3, line 33 to col 4. line 
10, col 2, lines 38-56, col 7, lines 43-56: comparing each two documents for 
similarity based on the query-relevant parts referred as "snippets" where the 
documents found can be duplicate, or duplicate with slight change, which means 
near-duplicate) 

Gomes does not explicitly disclose that for each document, identifying the distinctive 
features contained in the document. 

However, it would have been obvious to one of ordinary skill in the art at the time of the 
invention was made to have modified Gomes to include identifying the distinctive 
features contained in each document since the fact that the query-relevant parts are 
extracted from the plurality of documents suggests that the query-relevant parts are 
identified in each document before being extracted. 

Gomes also does not disclose that the method is applied to a collection of text spans 
where the text spans are sentences. However, it would have been obvious to one of 
ordinary skill in the art at the time of the invention was made to have utilized identifying 
duplicates and near-duplicates documents in Gomes to apply to identifying duplicates 
and near-duplicates text spans where text spans are sentences since it was obvious 
that a document comprises a plurality of sentences. Accordingly, the two documents 
are identified duplicates if they have the duplicate sentences. Therefore, comparing the 
distinctive features of the documents should be based on comparing the distinctive 
features of the text spans, which are sentences included in a document. In other words, 
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Gomes inherently includes identifying duplicate and near-duplicate text spans, which 
are sentences. 

Independent claim 39 is for an apparatus of method claim 1 , and is rejected under the 
same rationale. 

10. Claims 3-7, 30 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Gomes as applied to claim 1 above, and further in view of Aiken (US Pat No. 6,240,409 
61,5/29/01, filed 7/31/98). 

Regarding claims 3 and 4, which are dependent on claim 1 , Gomes does not disclose 
explicitly that the method is applied to detecting plagiarism and to detecting copyright 
infringement. 

Aiken discloses a method for detecting the similarities between the two documents 
(abstract, col 3, lines 4-24) and applying the detecting of similarities for detecting 
plagiarism among a set of documents and providing copyright protection (col 18, lines 1- 
22). 

It would have been obvious to one of ordinary skill in the art at the time of the invention 
was made to have combined Aiken into Gomes for the following reason. Aiken 
discloses applying the detecting of similarities of documents to detecting plagiarism and 
providing copyright protection, thus motivating to apply the duplicate determination in 
Gomes to detecting plagiarism and providing copyright protection since the duplicate 
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features of the two documents in Gomes are the same as the similarities between the 
documents in Aiken and copyright protection is for preventing of the copyright 
infringement. 

Regarding claim 5, which is dependent on claim 1, Gomes does not disclose explicitly 
that the method is applied to determine the authorship of a document. 
As mentioned in claims 3-4 above, Aiken discloses applying the detecting of similarities 
for detecting plagiarism among a set of documents and providing copyright protection 
(col 18, lines 1-22). 

It would have been obvious to one of ordinary skill in the art at the time of the invention 
was made to have combined Aiken into Gomes for the following reason. Aiken teaches 
applying detecting the document similarities to detecting plagiarism and providing 
copyright protection, thus motivating to determine the authorship of a document, 
especially the duplicate documents in Gomes since both plagiarism and copyright 
protection are for confirming the real author of a document. 

Regarding claim 6, which is dependent on claim 1 , Gomes does not disclose that the 
method is applied to clustering successive versions of a document from among a 
collection of documents. 

Aiken discloses clustering successive versions of a document from among a collection 
of documents (figures la-b, 4a and col 10, line 4 to coMI, line 46). 
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It would have been obvious to one of ordinary skill in the art at the time of the invention 
was made to have combined Aiken into Gomes since Aiken discloses clustering 
documents based on similarities oftlie document contents thus motivating to utilize the 
duplicate and near-duplicate features of documents in Gomes for clustering documents 
in a collection. 

Regarding claim 7, which is dependent on claim 1 , Gomes does not disclose that the 
method is applied to seeding a text classification or text clustering algorithm with sets of 
duplicate or near-duplicate. 

Aiken discloses clustering documents using a text clustering algorithm based on the 
matching of the documents in a collection (figures la-b, 4a, col 7, lines 17-35). 
It would have been obvious to one of ordinary skill in the art at the time of the invention 
was made to have combined Aiken into Gomes since Aiken discloses a text clustering 
algorithm applied on the matched documents in a document collection thus motivating 
to utilize the duplicate features of documents in Gomes as the matching features of the 
documents for applying the clustering algorithm. 

Regarding claim 30, which is dependent on claim 1, Gomes does not disclose: 

- counting the number of distinctive features in common 

- wherein determining whether the pair of documents is duplicates or near- 
duplicates includes the steps of: 
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• for each pair of documents, calculating an overlap ratio by dividing the 
number of distinctive features in common by the smaller of the number of 
distinctive features per document 

• comparing the overlap ratio to a threshold and if the overlap ratio is 
greater than the threshold, then the pair of documents are duplicates or 
near-duplicates, othenA^ise the pair of documents are not duplicates or 
near-duplicates 

Aiken discloses a method for clustering documents based on detecting the similarities of 
the documents (abstract, figures la, 4a) where the similarities of each two documents 
are determined by: 

- calculating an overlap ratio by dividing the number of distinctive features in 
common by the smaller of the number of distinctive features per document (col 

1 1 , lines 1-14: "The similarity of two documents is defined by ratio C/T, where C 
is tlie number of tiasiies the two documents have in common and T is the total 
number of hashes taken of one of the documents, which can be the current 
document or the smaller document.. ") 

- counting the number of distinctive features in common (col 11, lines 1-14: 
calculating the ratio C/T inherently shows counting the number of distinctive 
features in common C) 

- comparing the overlap ratio to a threshold and if the overlap ratio is greater than 
the threshold, then the pair of documents are duplicates or near-duplicates, 
othenA/ise the pair of documents are not duplicates or near-duplicates (col 1 1 , 
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lines 15-46: "if CfT is less than the threshold (e.g. a predetermined parameter), 
the matches associated with the retrieved document are discarded ..." the fact 
that the matches are discarded if CfT is less than the threshold and only 
documents having an interesting or significant number of matches with the 
current document are retained suggests that the document having an significant 
number of matches with the current document have the overlap ratio C/T greater 
than the threshold, which means these documents are similar or duplicates to the 
current document, otherwise they are not) 
It would have been obvious to one of ordinary skill in the art at the time of the invention 
was made to have combined Aiken into Gomes since Aiken teaches calculating the ratio 
C/T for determine the similarities or duplicates of documents providing the advantage of 
apply Aiken's calculating method for effectively determining the duplicates of the 
documents. 

1 1 . Claims 8-9 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Gomes as applied to claim 1 above, and further in view of Armstrong (US Pat No. 
6.356,633 B1, 3/12/02, filed 8/19/99). 

Regarding claims 8 and 9, which are dependent on claim 1 , Gomes does not disclose 
that the method is applied to matching an email message with responses to the email 
message, and is to matching responses to an email message with the email message. 
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Armstrong discloses an email system that can access a database containing data and 
information related to predefined keyword lists, predefined response templates, 
predefined responses, etc., where the keylists can be matched with the content of the 
fields associated with the email, such as the "TO", "FROM", "RE", date/time created, 
date/time sent, date/time received, and of course, the body of the email message itself 
(col 5, lines 7-18, figure 2A, 3A-B). 

It would have been obvious to one of ordinary skill in the art at the time of the invention 
was made to have combined Armstrong into Gomes since Armstrong discloses using 
the matching of keylists and the content of the email fields to detect the relationship 
between an email message and its response via the content of fields for sending and 
receiving message, thus motivating to utilize the document duplicate features of Gomes, 
where the duplicate features imply matching of the documents based on a distinctive 
feature related to f<eywords in a query, for matching an email and the response to the 
email and vice versa. 

Allowable Subject Matter 

12. Claims 22, 24-25, 31-36 are objected to as being dependent upon a rejected 
base claim, but would be allowable if rewritten in independent form including all of the 
limitations of the base claim and any intervening claims. 

13, Claim 23 would be allowable if rewritten to overcome the rejection(s) under 35 
U.S.C. 112, second paragraph, set forth in this Office action and to include all of the 
limitations of the base claim and any intervening claims. 
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Conclusion 

14. The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. 
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